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Abstract 

We consider the problem of recovery of an unknown multivariate signal / observed in 
a d-dimensional Gaussian white noise model of intensity e. We assume that / belongs to a 
class of smooth functions C A2([0,1]'^) and has an additive sparse structure determined 
by the parameter s, the number of non-zero univariate components contributing to /. We 
are interested in the case when d = dg^ooase^0 and the parameter s stays “small” 
relative to d. With these assumptions, the recovery problem in hand becomes that of 
determining which sparse additive components are non-zero. Attempting to reconstruct 
most non-zero components of /, but not all of them, we arrive at the problem of almost 
full variable selection in high-dimensional regression. For two different choices of we 
establish conditions under which almost full variable selection is possible, and provide a 
procedure that gives almost full variable selection. The procedure does the best (in the 
asymptotically minimax sense) in selecting most non-zero components of /. Moreover, it 
is adaptive in the parameter s. 
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1 Introduction 

In recent years, there has been much work on methods for variable selection in high dimen¬ 
sional settings; refer, for example, to [3, 6, 8, 18] and references therein. Among a variety 
of methods proposed, the lasso has become an important tool for sparse high-dimensional re¬ 
gression problems. Motivated by the fact that finding the lasso solutions is computationally 
demanding, Genovese et al. [6] studied the relative statistical performance of the lasso and 
marginal regression, which is also known as simple thresholding, for sparse high-dimensional 
regression problems. They found that marginal regression, where each dependent variable is 
regressed separately on each covariate, provides a good alternative to the lasso, and concluded 
that their procedure merits further study. Handling the problem of reconstruction in high di¬ 
mensional regression, Genovese et al. [6] distinguished between the cases of exact, almost full, 
and no recovery. Exact recovery refers to the situation where the set of all relevant components 
can be consistently recovered (asymptotically). Almost full recovery stands for the possibility 
of having the number of misclassified components negligibly small as compared to the number 
of all relevant components. The latter strategy requires milder restrictions on a statistical 
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model and can be used in the situations where exact recovery is impossible. If neither exact 
nor almost full recovery can be achieved, we speak of ‘no recovery’ when the optimal risk is as 
large as the number of relevant components and any recovery procedure fails completely. 

Ingster and Stepanova [13] extended the idea of Genovese et al. [ 6 ] to the case of non- 
parametric regression. Specifically, they addressed the problem of recovering sparse additive 
smooth signals observed in the continuous regression model and showed that, asymptotically, 
as dimension increases indefinitely, exact variable selection is possible and is provided by a 
suitable thresholding procedure. The procedure in [13] is optimal in the asymptotically mini¬ 
max sense. It is also free from the sparsity parameter and thus is adaptive. At the same time, 
the more intricate problem of almost full recovery in an adaptive setup remained unsolved. 
We shall treat this problem in the present paper. 

Our setting is that of a multivariate signal / G C L 2 ([ 0 ,1]^) = L 2 corrupted by a 
Gaussian white noise of a given intensity e: 

X, = f + eW, (1) 

where IT is a d-dimensional Gaussian white noise on [0,1]“*, e > 0 is a noise intensity, and 
is a subset of L 2 that consists of sufficiently smooth functions. In the present paper, two 
examples of will be considered. In this model, the “observation” is the function : L 2 Q 
taking its values in the set Q of normal random variables such that if ^ ((/>), 77 = X|;(^l)), 

where ^ L 2 , then E(^) = (/,</>), £( 17 ) = (/,V’); and Cov(^, 77 ) = 7 /)). For any 

/ G L 2 , the observation determines the Gaussian measure Pe,/ on the Hilbert space L 2 
with mean function / and covariance operator e^I, where I is the identity operator (see [9, 19] 
for references). The expectation that corresponds to the probability measure P^j is denoted 
by E^j. In this paper, the case of growing dimension d = —)■ 00 as e —)■ 0 is studied. It is 

well known that the continuous model ( 1 ) serves as a good approximation to a more realistic 
equidistant sampling scheme with discrete Gaussian white noise. In such an approximation, 
roughly corresponds to the number n of observations per unit cube [ 0 , 1 ]^. 

An important problem in this context is to recover / from noisy data. Attempting to 
suppress the curse of dimensionality and complement the findings in [13], we assume that / 
has an additive sparse structure. Our goal is to study under what conditions and by means of 
what procedure almost full recovery of an additive sparse signal / is possible. In other words, 
we wish to correctly identify most non-zero components of /. In doing so, we aim at providing 
the procedure that, for the two function spaces of our interest, one consisting of functions 
of finite smoothness and the other consisting of functions of infinite smoothness, is optimal 
in the asymptotically minimax sense. In the almost full recovery regime, one can detect even 
smaller relevant components but, unfortunately, at the price of a loss in the rate. Therefore 
constructing the corresponding procedure is technically more demanding as compared to that 
in the exact recovery case. To develop a good almost full recovery procedure, we will use 
results from minimax hypothesis testing and minimax estimation theory. 

To fix some notation and assumptions, let the signal / in model (1) be of the form (see, 
for example, [5] and [13]) 

d 

/(x) = ^??j7i(a;i), X= (xi,...,Xd) G [0, 1 ]"*, 77 = (r7i,...,77d) G 

i=i 
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where for a number s G {1 ,..., d}, called the sparsity parameter, 


d 

'Hd,s = {v = ivu-- ■,Vd) ■■ Vj e {0,1}, 1 <j<d, = s}. 

i=i 


The ?7j’s are non-random quantities taking values 0 and 1; the case r]j = 1 {rjj = 0) corresponds 
to the situation when the component fj is active (non-active). When s = o{d) we speak of 
a sparse additive signal /. In addition, each component fj is assumed to be an element 
of a certain smooth function space To- C L2[0,1] depending on a known parameter a > 0; 
two examples of To- under study are introduced in Section 2. Thus, the class of s-sparse 
multivariate signals of interest is 

where the components satisfy side condition that guarantees uniqueness, and the signal recov¬ 
ery problem becomes that of determining which sparse additive components are non-zero. 

In the context of variable selection, the problem of reconstruction of an additive function 
/ is now stated as follows. For each component fj of a signal / G T'g o-, consider testing the 
hypothesis of no signal H^j : fj = 0 versus the alternative Hij : fj G iFaire), where for a 
positive family ^ 0 


T'f 


= If /(x) = Y] Vjfjixj), [ fj{x) dx = 0, fj I < j < d, rj = {r]j 

,=i ^0 


= {g ■ \\g\\a < 1, Hfflb > rj, ( 2 ) 

and II • llo- is a norm on To-. In this problem, a precise demarcation between the signals that 
can be detected with error probabilities tending to 0 and the signals that cannot be detected 
is given in terms of a detection boundary, or separation rate, r* ^ 0 as e —)■ 0. For various 
function classes frequently used in minimax hypothesis testing, sharp asymptotics for r* are 
available (see, for example, [10]). The hypotheses Hoj and Hij separate asymptotically (that 
is, the minimax error probability tends to zero) if r^/r* —)■ oo as e ^ 0. The hypotheses 
Hoj and Hij merge asymptotically (that is, the minimax error probability tends to one) if 
Te/r* —^ 0 as e —)■ 0. 

When Hoj and Hij separate asymptotically, we say that fj is detectable. If the hypotheses 
Hoj and Hij separate (merge) asymptotically when liminfre/r* > I (limsupr^/r* < I), the 
detection boundary r* is said to be sharp. The knowledge of a sharp detection boundary r* 
allows us to have a meaningful problem of testing H^j : fj = 0 versus Hij : fj G To-(re) by 
choosing so that liminf^^o> I- Otherwise, the function fj will be too “small” to be 
noticeable. 

Let us agree to say that any measurable function rj* = rj*{X^) taking values on {0,1}'^ 
is a selector. Following [6] and [13], we judge the quality of a selector ig* of vector g G Hd^s 
by using the Hamming distance on {0,1}'^, which counts the number of positions at which 
g* = {gl,..., g*^) and g = {gi,... ,gd) differ: 

d 

\h* -V\ = J2\dj -hj\- 
i=i 

Following [6], we distinguishe between exact and almost full recovery. Roughly, a selector g* = 
g*{X^) is asymptotically exact if its maximum risk is o(I). Likewise, a selector g* = g*{X^) 
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is asymptotically almost full if its maximum risk is o(s) with s being the number of non-zero 
components fj of a signal / = Yl'j=i Vjfj- 

Ingster and Stepanova [13] have obtained adaptive procedure that gives asymptotically 
exact reconstruction of a cr-smooth signal / G observed in a d-dimensional Gaussian 
white noise model. A similar result for the space of infinitely-smooth functions is stated in 
this paper in Section 4.2 (see Theorems 1 and 2). Although the selector in Section 4.2 is 
based on somewhat different statistics when compared to the one in [13], both selectors have 
one common feature that their thresholds are free of the sparsity parameter s and therefore 
automatically adapt themselves to its values. 

The goal of this paper is three-fold. First, we find a sharp detection boundary that allows 
us to separate detectable components of a signal / G from non-detectable ones. Next, 
assuming that all active components fj are detectable and that s belongs to a set Sd, which 
puts some mild restrictions on the range of s, we construct a selector rf = rf{X^) with the 
property 

sup sup sup — r/| —)■ 0, as e ^ 0. (3) 

s&Sd 

Finally, we show that if at least one of the /j’s is undetectable, then 

liminfmf sup sup — r/| > 0, (4) 

fj 'q&-Hd,s 

that is, almost full recovery is impossible. 

The selector -q* that satisfies (3) is said to provide asymptotically almost full recovery of 
a signal / G in model (1); its maximum risk is small relative to the number of non-zero 
components. If, in addition, inequality (4) holds true, then the selection procedure based on 
q* is the best possible (in the asymptotically minimax sense). The notion of optimality that 
we use is borrowed from the minimax hypothesis testing theory. 

In the present setup, adaptive (in s) variable selection in high dimensions presents several 
challenges. First, one has to construct a good non-adaptive selector. Second, having that 
selector available, one has to adapt it to unknown values of the parameter s. It turns out 
that, when s is known, both exact and almost full recovery can be achieved by a suitably 
designed thresholding procedure (see Section 3.1 for details). The problem of adaptation of 
this procedure to unknown values of s was tackled and solved in [13], but in the case of 
exact recovery only. Handling the same problem in the almost full recovery case will bring 
us in this paper to the use of Lepski’s method. This method was proposed for adaptive 
estimation in a Gaussian white noise model. The reason why adaptive reconstruction of most 
relevant components of / turns out to be more challenging than adaptive reconstruction of all 
components of / lies in the very nature of the thersholding procedure as defined in (20). In 
contrast to the exact selector given by (22) whose threshold is set regardless of the value of s, 
thresholding in (20) does depend on s. 

The paper is organized as follows. In Section 2 we present some general results of the 
asymptotically minimax hypothesis testing theory and provide details on their use for the 
two function spaces of our interest. In Section 3 we translate the initial problem to the one 
in terms of the Fourier coefficients and, for both function spaces in hand, obtain almost full 
selectors for a known sparsity parameter s. In addition to that, we derive conditions under 
which almost full variable selection is possible. Adaptive selectors for the function spaces in 
hand are developed in Section 4. To complete the picture, we also introduce an adaptive 
selection procedure that gives exact reconstruction for the space of analytic functions. Our 
main results are stated in Section 4 and proved in Section 5. 
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2 The building blocks 


As in [13], the recovery problem under study will be connected to that of hypothesis testing. 
Before stating and proving our main results, we shall discuss some important tools of minimax 
hypothesis testing that will be used in the subsequence sections. For a complete exposition of 
the subject, see [15] and the review papers [10, 11, 12]. 


2.1 Extreme problem for ellipsoids: general case 

In asymptotically minimax hypothesis testing, when dealing with classes of smooth functions, 
the first common step is to transform the initial problem involving a class of functions to the 
corresponding problem in the space of Fourier coefficients. For this, let {(l)k{x)}k&'L be the 
orthonormal basis in L 2 [0,1] given by 

00 ( 3 :) = 1, (/)fc(x) = \/2 cos(27rA:x), (/>_fc(x) = \/2sin(27rA:x), A: > 0. 

If 5 G L 2 [ 0 , 1], then g(x) = Ylk&'L^k(t^k{x), where 9k = {g,4’k) is the fcth Fourier coefficient 
of g, and \\g \\2 = be a function space depending on a parameter cr > 0 that 

is a subset of L2[0, 1]. Suppose that g G Ta C L2[0, 1] is observed in a univariate Gaussian 
white noise of intensity £, and we wish to test the null hypothesis Hq : g = 0 versus a 
sequence of alternatives : g G To-ive), where the set J>(re) is given by (2). For the two 
function spaces of interest, the norm of an element g is expressed as [[^H^ = Ylkez^k^k with 
specified coefficients c| = c|(cj) (see formulas (8) and (12) below). In the sequence space of 
Fourier coefficients, the set Tai^e) corresponds to the ellipsoid in the space with semi-axis 
Cfc = Ck{cr) and a small neighbourhood of the point 9 = 0 removed: 


&a{re) 


{9k)kez £ h{'^) '■ < 

kez 


l,^9l> rl 

kez 


(5) 


For constructing an asymptotically almost full selector, we shall need some facts from the 
minimax theory of hypothesis testing. Denote by 9*{rs) = {9l.{re))k&'L the solution to the 
extreme problem 
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inf , 

8&Qa{re) 


( 6 ) 


and let ul{rs) = ^^(©^(rg)) be the value of the problem, that is, 

u‘i{r^) = —j inf 9t = —j (0t(r£))^ . 

^ ’ fcez fcez 

The function plays a key role in the minimax theory of hypothesis testing. It controls 

the minimax total error probability and is used to set a cut-off point of the asymptotically 
minimax test procedure. The detection boundary r* in the problem of testing Ho : 9 = 0 
versus Hi : 9 G ©^(rg) is determined by the relation Us{r*) x 1. The function U£{rs) is a 
non-decreasing function of the argument which possesses a kind of ‘continuity’ property. 
Namely, for any e > 0 there exist A > 0 and eo > 0 such that for any 5 G (0, A) and e G (0, eo), 


Ue{rs) < ii£((l -h S)re) < (1 + e)ue{rs). 


(7) 
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These and some other facts about u^ivs) can be found in [10, Sec. 3.2] and [15, Sec. 5.2.3]). 

For standard function spaces with the norm Us'llo- defined (under the periodic constraints) 
in terms of Fourier coefficients as \\g\\‘^ = form of the extremal sequence 

(^fc(^e)fcez in problem ( 6 ) as well as the sharp asymptotics for Ue{re) are available. Below we 
cite some relevant results for the two function spaces of our interest: the Sobolev space 
of periodic cj-smooth function on M and the space of periodic functions on M that admit an 
analytic continuation to the strip around the real line. 

2.2 Extreme problem for Sobolev ellipsoids 

Let J-fj with a > 0 denote the Sobolev space of u-smooth 1-periodic functions on R. Define 
the norm jj • on Jv by the formula 

ll/lla = 4 = 4(4 = (2vrl/cl)^'", (8) 

kez 

where 6k is the kth Fourier coefficient of / with respect to {4>k{x)}ke'E- If o' is an integer, then 
under the periodic constraints (when the function admits 1 -periodic [cj]-smooth extension on 
the real line) the norm as in ( 8 ) corresponds to 



For a function / G Jv consider testing the hypothesis Hq : f = 0 versus the alternative 
Hi : f ^ J^a-(xe), where for a positive family ^ 0 

T-.(r,) = {/G : 11/11. < 1, H/lb > rj. 

Switching from Sobolev balls {/ G Jv : ||/||. < 1} to Sobolev ellipsoids {6 G ■ 

Ylik&44 — 1} lo^-ds to the problem of testing Hq : 0 = 0 versus Ffi : 0 G ©.(r^). The 
test procedure that does the best in distinguishing between the latter two hypotheses is ob¬ 
tained by solving the extreme problem (6) with the semi-axes Ck defined as in (8); see Section 
3 of [10] for details. The extremal sequence (0^(re)fcgz satisfies (see, for example, [10, §3.2] 
and Theorem 2 in [16]): 

{6l{re)4 ^ (l “ for 1 < [A:] < and 9l{re) = 0 otherwise, (9) 

where 

iL, = [(4a + l)^/( 2 <x)^-i/.j^ (jQ) 

The sharp asymptotics for Usire) are of the form (see [15, §4.3.2] and Theorems 2 and 4 in 
[16]) 

Ueire) ~ C'(cr)r^+^/(^'^)£"^, £ ^ 0, (11) 

where (see, for example, p. 104 of [10]) 

CM=2.[(l + T)(l+4a)V(-)(B(T,2))‘'’] . 

and B{-, ■) is the Euler beta-function. 
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2.3 Extreme problem for the ellipsoids of analytic functions 

The following example of Jv is also well known in nonparametric estimation and hypothesis 
testing. Let Jv with a > 0 be the class of 1-periodic functions / on M admitting a continuation 
to the strip = {z = x + iy ■. |y| <cj} cC such that /(x + iy) is analytic on the interior of 
Sfji bounded on and 

/ |/(x ± dx < oo. 
do 

Let the norm || • ||i^o- on Jv be given by (see, for example, [7]) 

ll/lli,a = / (Re/(x + zo-))^ dx. 
do 

In terms of the Fourier coefficients, the squared norm ||/||f takes the form 

II/111,<7 = X] cl = cl{a) = cosh2(27rfj/c). 

kez 


In view of the relations 


exp(|x|) < 2 cosh(x) < 2 exp(|x|), x G M, 
we may also consider an equivalent norm || • ||o- defined as 

\\f\\l = ^^lcl, Cfc = Cfc(o-) = exp(27ro-|/c|). (12) 

kez 

We have chosen to deal with the latter norm as it is easier to study. 

The ball {/ G Jv : ||/||cr < 1} corresponds to the ellipsoid {0 G hC^) ■ Xlfcez cl^l — '''^hh 

the semi-axes defined as in (12). Thus translating the problem of testing Hq : / = 0 versus 
Hi : f G Tair^) to the one in terms of Fourier coefficients brings us to testing Hq : 9 = 0 versus 
Hi : 9 G The asymptotically minimax test procedure that distinguishes between these 

two hypotheses is obtained by solving the extreme problem (6) with the semi-axes Ck defined 
as in (12). The elements of the extremal sequence {9l{re)k&z ia problem ( 6 ) with the semi-axis 
Ck as above may be taken as constants (independent of k) satisfying as e —)• 0 (see, for example. 
Section 3 in [10]) 

d^(re) X Te log“^'^^(r“^) for l<\k\<Ks and ^^(rg) = 0 otherwise, (13) 


where 


ifg = [(27rcj) Mog(r^^)J, 


and we have 



(27rcj)^/^ 


(14) 


(15) 


Formulas (13)-(15), as well as formulas (9)-(ll), will be employed to construct almost full 
selectors for the two function spaces under study. 
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3 Variable selection in a sequence space model 


By sufficiency, the problem of recovering / observed in the Gaussian white noise model can 
be transformed to an equivalent problem in a sequence space model. Acting as in [13], for the 
index I G whose jth component is equal to k and the other components are equal to zero, 
define the function 


= M^) = Mxj), X= (xi,...,Xrf) G [0, l]"*, l<j<d, /c G Z, 

and denote by 9j^k = if, 4’j,k) = Jq 4’kix)fj{x) dx the A:th Fourier coefficient of the jth compo¬ 
nent fj of a signal / = djfj- Consider the sequence space model 

= djdj,k + £Cj,k, f.j,k A^(0,1), 1 < j < d, A: G Z, (16) 

where Xj^k = are the empirical Fourier coefficients and the collection {rjiOi ,..., rjdOd) 

consists of sequences r]j6j = i'rijdj,k)kez such that ir]j) G Tid^s and for all 1 < j < d, 

d, = G/ 2 (Z), (17) 

fcez 

In this paper we have chosen to deal with the latter model, which is technically more con¬ 
venient. Although the set of 6jS in (17) involves an orthogonal system in L 2 , the results on 
minimax errors and risks do not depend on the choice of this orthogonal system because the 
random variables Xj^k, which generate a sufficient a-algebra for / G are independent 

normal N{r]j9j^k,£‘^)- Thus the distribution of {Xj^k} depends on the Fourier coefficients 9j^k 
of / with respect to the system {(pj^k} but not on the choice of {(pj,k}- Using a suitable finite 
collection of the random variables Xj k as defined in (16), we wish to construct an optimal 
selection procedure that recovers most non-zero components of (r/idi,... ,rjd9d), but not all of 
them. 


3.1 Almost full variable selection in the non-adaptive case 

We first consider a non-adaptive setup when the sparsity parameter s is known. When deal¬ 
ing with the problem of variable selection in model (16), we make use of the statistics, cf. 
asymptotically minimax test statistics in Section 3.1 of [10], 




^ uJkir*is)) 

l<\k\<Ks 


X. 


j,k 


- 1 


j = 1 ,... ,d. 


(18) 


where for any > 0 the weight functions Ukive) are given by the formula 


, , 1 i9lir,)f , / ^ 

^kire) = ^ O ( \ ^ 1 < F < Ke 

2e^ UeiVe) 

and the number r* (s) > 0 is the solution of the equations 

Ue(r*(s)) 


\/21og(d/s) 


= 1 . 


(19) 


For both function spaces of interest, the quantities K^, ^^(rg), and Ueixe) in formula (18) are 
specified in Section 2. The sparsity parameter s G {1, 2,... , d} is assumed to be small relative 



to d, that is, s = o{d). Note that the weights ujkire) are normalized to have ^ ~ 

l<\k\<Ke 

1 / 2 . 

Now we define a non-adaptive almost full selector to be 

V = ivi,-■ ■ ,Vd), Vj = I (^tj > ^/2log{d/s) + 6 log d^ , j = l,...,d, (20) 

where 5 = > 0 satisfies 


5 —)■ 0 and (51og(i —)• oo, as e ^ 0 . ( 21 ) 

The arguments as in the proof of Theorem 1 show that for Sobolev ellipsoids, under the 
conditions, cf. (23), 

logd = liminf y 

y/2log(d/s) 

the selector ij reconstructs almost all relevant components of a vector rj G and hence 

asymptotically provides almost full recovery of a signal / G in model (1). 

To illustrate the difference between exact and almost full reconstruction in adaptive set¬ 
tings, assume that Jv is the Sobolev space. In this case, a selector (see Section 3.1 of [13] with 
s in place of d^~l^) 

rj* = rj* =l{t* > + S)logd) , j = l,...,d, ( 22 ) 

where the statistics t'j are defined similar to the tj as in (18) with the relation 

Usjrtis)) _ ^ 

'J2 log d -I- \/2 log s 

instead of (19), turns our to be a non-adaptive exact selector^ as long as 

logd = and lim inf —> 1. (23) 

s^o ^2 log d + V 2 log s 

Under the above conditions, the procedure based on rf selects correctly all non-zero compo¬ 
nents of a vector rj G Ti-ds-, and hence provides exact recovery of a signal / G Fg „ in model 
( 1 ). 

Contrasting with formula (22), the threshold in (20) is set at a lower level and is dependent 
on the parameter s. The latter fact makes the idea of adaption suggested in [13] for the exact 
reconstruction case invalid (see Section 3.3. for details). In the next section, we obtain the 
desired adaptive selector by using Lepski’s method. Before doing that, we provide conditions 
on d as a function of e under which the thresholding procedure ( 20 ), as well as its adaptive 
version introduced in Section 4.1, gives asymptotically almost full reconstruction of a function 

/ G 

3.2 Conditions for almost full variable selection 

Consider now the question of determining conditions on d as a function of e under which 
almost full variable selection is possible. Violation of these conditions will lead to entirely 
different selection strategies. 
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In the sequence space of Fourier coefficients, consider testing the null hypothesis Hqj : Oj = 
0 versus the alternative Hij : 6 j G 0o-(re), where the set ©^(rg) is given by (5). It is easy to 

see that under the null hypothesis HQj, we have (see, for example, Section 4.1 of [13]) 

Eo(tj) = 0, Varo(fj) = 1, 

while under the alternative Hij, where for all sufficiently small e a small parameter > 0 
satisfies rs/r*{s) > 1, 

= £-2 u;k{r*is))e]^k>Us{r*eis)), (24) 

i<\k\<Ke 

Yar 0 {tj) = l + 0{Eg{tj) max iOk{r*{s))). 

Furthermore, under the above restrictions on and r*{s) the following result holds (in case 
of Sobolev spaces, see Proposition 7.1 in [5] and Lemma 1 in [13]; in case of the space Jv of 

analytic functions, the proof is similar to that of Sobolev spaces). 

Let the quantity T = —)■ —oo and the weight functions Ldk{r*{s)) as in (18) be such that 

as £ —^ 0 


max ujk{r*e{s)) ^ 0 and Ee^.(tj(s)) Wfc(r*(s)) ^ 0. (25) 

Then as e ^ 0 

Eo{tj <T)< exp 

and for all j = 1 ,..., d, uniformly in 6 j G ©^(re), 

P 0 .{tj - <T) < exp 

For both function spaces Jv of our interest, the exponential bounds (26) and (27) will 
be applied below to the quantity T = —)■ —oo of order 0(log^'^^ d). This observation and 
assumption (19) transform requirement (25) into 

log^/^d max cjfc(r*(s)) —)■ 0, e —)■ 0, (28) 

l<\k\<Ks 

Condition (28) gives a restriction on the growth of d = ensuring that the selection procedure 
works as designed. Indeed, as shown in Section 4.1 in [13], for the Sobolev space of cj-smooth 
functions, one has 

i^kire) for l<\k\< K^, 

and 

r*(s) X (£log(i)'"/(^'"+^) . 

Therefore condition (28) is fulfilled when 

logd = o(e"2/(2'"+^)) (29) 
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In case of the space of analytic functions, one has 


uJkife) - log for l<\k\< K^, (30) 

and, in view of (11) and (19), the quantity r*(s) satishes 

implying 

r*{s) X elog^/'‘(d) log^/'‘((r*(s))"^). 

Therefore log (j,r*{s))~^^ ~ log(e“^), and (see (30)) 

‘^k{r*e{s)) X log"^/2(e"^). 

From this, the technical condition (28) holds true when, cf. formula (30), 

logd = o(log(e“^)), e —^ 0. (31) 


4 Main results 

In this section, we consider a more realistic problem when the sparsity parameter s is unknown. 
We derive conditions under which almost full variable selection is possible, and construct a 
selector for which the Hamming distance is much smaller than the number of relevant com¬ 
ponents (see Theorems 3 and 5). Our selector is adaptive in the sparsity parameter s and 
is unimprovable in the asymptotically minimax sense (see Theorems 4 and 6 ). In addition 
to that, in Section 4.2 we provide asymptotically exact selection procedure for the space of 
analytic functions that is adaptive in the sparsity parameter s. 

4.1 Almost full variable selection in the adaptive case 

In this subsection, the selector fj as in (20) will be used to obtain the corresponding adaptive 
procedure. To avoid losses due to adaptation, we will have to limit the range of the possible 
values of s. Namely, we assume that for some constants 0 < c < C < 1 

c < lim inf (log s/log d) < hmsup(log s/log d) < C, (32) 

d^oo 


and define the set 


= {s G {1,..., d} is such that condition (32) holds} 

over which the adaptive selector that we propose yields almost full selection. The restriction 
on s as in (32) is relatively mild. For instance, any s = with /I G [b, B] for some constants 
0 < 6 < H < 1 belongs to Sd- 

To construct the desired selector, for some A = > 0 and M = [(C — c)/A] +1, pick 

grid points over the interval (l,d); 

Si = (f, Sm = Sm-id^ = sid*'™'“^^'^, 2 < m < M, (33) 
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and assume that 


A —0, Alogd—)-0, as d^oo, 


(34) 


yielding < const for all large enough d. For each m = 1,..., M, let the parameter f'e (Sm) > 
0 be determined by the equation, cf. (19), 

Ue{r*{s„^)) ^ ^ 

\/ 21 og(d/ 

^m) 

where, depending on a type of the ellipsoid ©^(re) we are dealing with, the function Ue{rs) 
satisfies either (11) or (15). 

Similar to the case of known s, consider weighted chi-square type statistics, cf. (18), 


tj {Sr 


^ UJk{rl{Sm)) 

l<\k\<Ke 





with weight functions 


Wfe(r*(sm)) 


1 mrtism)? 

2e2 Us{r*{sm)) 


l<\k\< K„ 


possessing the property Y^i<\k\<Ke ^ki^e (^m)) = 1/2. The values of OKr* (s)) and depend 

on the function space under consideration. For the Sobolev space in hand, 0'l{r*{s)) and 
are as in (9) and (10); for the space of analytic functions, 6 *fc(r/(s)) and are as in (13) and 
(14). 

Next, for all j = 1,..., d and m = 1,..., M, set 


r]j{sm) = I {tj{sm) > \/2log{d/sm) + 5logd^ 


where <5 = > 0 satisfies (21), and define an adaptive selector of a vector rj G "Hd^s by the 

formula 


d('Sm) = (dl(Sm), • • • ,%(Sm)), (35) 

where fh is chosen by Lepski’s method (see Section 2 of [17]) as follows; 

m = min{l < m < M : \T]{sm) — l?('Si)| < Vi for all i > m} . 

Here the quantities Vi = Vi^d are set to be 

Vi = Si/Td, m<i<M, 

with a sequence of numbers ^ oo satisfying (recall that d = —)• oo as e ^ 0) 

Td = o ^min(log d, d^^^)^ , as e ^ 0. 

Algorithmically, Lepski’s procedure for choosing m works as follows. We start by setting 
m = M and attempt to decrease the value of m from M to M — 1 . If \r]{sM-i) — v{sm)\ < vm, 
we set rh = M — 1 ; otherwise, we keep rh equal to M. In case m is decreased to M — 1 , we 
continue the process attempting to decrease it further. If \r]{sM- 2 ) — < vm-i and 

\^{sm- 2 ) — d{sM)\ < Vm, we set m = M — 2; otherwise, we keep m equal to M — 1; and so 
on. Notice that by construction vm > vm-i > ■ ■ ■ > vi. 
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4.2 Exact variable selection for analytic functions in the adaptive case 

The problem of adaptive reconstruction of sparse additive functions in the Gaussian white 
noise model was studied in the only case of cj-smooth functions, see [13]. Before handling the 
problem of almost full variable selection in adaptive settings, we complement the findings in 
[13] by presenting an adaptive exact selector for the space of analytic functions. The strategy is 
similar to the one suggested in [13] for cr-smooth functions, but the parameters of the statistics 
and the condition on the dimension d are different. 

Consider a sequence space model that corresponds to the Gaussian white noise model with 
/ from the class of analytic functions Jv as defined in Section 2.3. Let 1 < si < si < ... < 
sm < d he the grid of points as in (33). For any m = 1,..., M, let the parameter r* ^ > 0 be 
determined by the equation 




\/2 log d + V2 log Sji 

Consider weighted chi-square type statistics 


= 1 . 


tj,m — 

l<\k\<Ke 




- 1 


, j = l,...,d, m=l,...,M. 


with weight functions 


‘^k[re.m) = 


1 mrijy 

Usirl^) 


obeying the normalization condition Xlfcezm) = 1/2- Next, for all j = l,...,d and 
m = 1 ,..., M, set 

'nj,m = I (tj,ni > \/ {2 + 6) (log d + log M)^ , 

and define an adaptive exact selector rj** of a vector rj G 'Hd,s by the formula (see formula (18) 
in [13]) 


r] = (r?i ,..., ), m = max j = 1,..., d. 

l<m<M 


(36) 


The idea behind the selector t]** is as follows. The jth component of a signal is viewed active 
if at least one of the statistics tj^m, 'm = 1,...,M, detects it. Therefore, thinking of rjj^m 
and r]** as test functions, we get that the probability of having 9j incorrectly undetected does 
not exceed the respective probability with the r]j^m test, where Sm is close to the true (but 
unknown) value of s. Furthermore, the probability that r/t* incorrectly detects 9j is less than 
the sum of the respective probabilities for the rjj^m tests over all m = 1 ,... ,M, and is small 
by the choice of threshold. 

Let the set be as in (37) with the coefficients Ck given by (12). The following two 

theorems, whose proofs are similar to those of Theorems 3 and 4 in [13], hold true. 


Theorem 1. Let s G {1, ... ,4} he such that s = o{d). Assume that logd = o(loge ^) and 
that the quantity = rs{s) > 0 satisfies 


lim inf 
€—^0 


Usjrs) 

\/2 log d + V2 log s 


> 1 . 
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Then as e —)■ 0 


sup sup — 77**1 —)■ 0 , 

n£Hd,a de&a,d{^e) 

where rj** is the selector of vector rj as defined in (36). 

Theorem 2. Let s G {1,... , 6 ?} be such that s = o{d). Assume that logd = o(loge“^) and 
that the quantity r^ = re{s) > 0 satisfies 

V 

lim sup — : < 1 . 

£^>0 y 2 log d + y/2 log s 


Then 

liminfinf sup sup — 77 ! > 0 , 

V V&'Hd,s 0 ee„^d{re) 

where the infimum is over all selectors fj of vector 77 in model (16). 

Remark 1. The sharp detection boundary in Theorems 1 and 2 which makes it possible to 
decide on whether we are in a position to proceed further with variable selection or not, is 
determined in terms of the function ufirfi) with sharp asymptotics as in (11) and (15). The use 
of ufirfi) instead of makes it easy to build a bridge between variable selection in Gaussian 
white noise setting and variable selection in regression setting as studied in Sec. 4 of [ 6 ]. In 
addition, using ^^(rg) instead of r^ makes the statement of detectability condition precise. By 
‘continuity’ of Ue{rfij as cited in (7), the conditions of Theorems 1 and 2 that separate detectible 
components from undetectable ones can be written in a usual form liminfe^o > 1 and 
limsupg^g Te/r* < 1, where for Sobolev ellipsoids the sharp detection boundary r* is found 
explicitly from (11), and for the ellipsoids of analytic functions it is found implicitly from (15). 
Similar remark applies to Theorems 3 to 6 stated in Section 4.3 and 4.4, 

4.3 Almost full variable selection for Sobolev balls 

Consider the set ©^(rg) as in (5) with the coefficients given by ( 8 ), and define the set 

Q.Are) = {0 = (%) : 9j = i9j,k) G Y^^lOlk < b E 1 < J < d\. (37) 

Let fi{sfa) be the selector given by (35) based on the statistics tj{sm) as in (18), where the 
quantities K^, and U£{rfi) are specified by formulas (9), (10), and (11), respectively. 

The following theorem holds. 

Theorem 3. Lets G {1,... ,4} he such that (32) holds true. Assume that logd = o(e“b(2o-+i)) 
and that the quantity r^ = re(s) > 0 satisfies 

Umm{^^dA=>L 

A 2 log(d/s) 

Then as e ^ 0 

sup sup sup s“^E^^ 0 |r 7 (s^) — 77 ! —)■ 0 . 
seSdri&Hd,sdee„^d{re) 
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Theorem 3 says that if all the hypotheses : 6j = 0 and Hij : 9j G j = 

separate asymptotically, then the selection procedure based on T]{sfn) reconstructs 
almost all non-zero components of a vector 77 G and thus provides almost full recovery of 
{r]iGi, ■ ■ ■,'nd0d), uniformly in Sd, 'Hd,s, and &a,d{^e)- 

The next result shows that if the detectability condition is not met, almost full selection 
is impossible. 


Theorem 4. Let s G {1,... ,(i} be such that s = o{d). Assume that logd = o(e 2/(2o--i-i)^ 
that the quantity = re{s) > 0 satisfies 

V Mre) 

iimsup — = < i. 

£^>0 y 21 og(d/s) 


Then 

liminfinf sup sup — 77 ! > 0 , 

V T]&'Hd,s eee^^dire) 

where the infimum is over all selectors fj of a vector rj in model (16). 


4.4 Almost full variable selection for analytic functions 

The results similar to Theorems 3 and 4 hold true for the space of analytic functions. Namely, 
consider the sets Qa{fe) and &a,dii^e) as in (5) and (37) with the coefficients given by (12). 
Again, let fi{sfd) be the selector defined by (35) based on the statistics tj{sm) as in (18), but 
the quantities 6l{rs), K^, and ufirfi) are now as in (13), (14), and (15), respectively. The 
following results hold true. 

Theorem 5. Let s G {1,... ,(i} he such that (32) holds true. Assume that logd = o(loge“^) 
and that the quantity r^ = re(s) > 0 satisfies 

> 1. 

y/2\og{d/s) 

Then as e ^ 0 

sup sup sup - r]\ ^ 0. 

s&Sd'q&Hd,s dee„^d{re) 


Theorem 6. Let s G {1,... ,(i} be such that s = o{d). Assume that logd = o(loge ^) and 
that the quantity r^ = rs{s) > 0 satisfies 


limsup^^li^d= < 1. 

£^>0 y 21 og(d/s) 


Then 

liminfinf sup sup s“^E^^ 0 |r 7 — 77 I > 0 , 
V ri&nd,s eee^^dire) 

where the infimum is over all selectors fj of a vector rj in model (16). 


Remark 2. We should remark that the best selection procedure yields exact variable selection 


only if the condition liminfe^o 


Ue(re) 


> 1 holds; at the same time, the best selection 


\/2To^+v^2To^ 

procedure gives almost full variable selection if a milder condition liminf^^o 
met. 


Ue(re) 


\/2 log(d/£) 


> 1 is 


15 



5 Proofs of the Theorems 


In this section, we prove Theorems 3 and 4. The proofs of Theorems 5 and 6 go along the 
same lines and therefore are omitted. Throughout the proof, the exponential bounds (26) and 
(27) on the tail probabilities of the statistics tj{s) will be frequently used. 

Proof of Theorem 3. Let mo G {2,..., M} be such that 

SniQ — l ^ S <C SjYiQ7 

which implies that Smo/s < d^. Then, using the definition of the selector ri{sm), we can write 

sup sup S~^E^^ 0 \f]{Srh) - r]\ 

r)&'Hd,s d&B^^d{rs) 

< sup sup s~^Er^^ 0 {\fi{sfa)-ri\\m <mo)Enfi{m Kmo) 

'n&'Hd,s 060CT,d(»’e) 

+ sup sup (|r)(sm) - r/llm > mo) P ,,,0 (m > mo) 

V&'Hd,s 8&0^^d{re) 

< sup sup s~^En ^0 (|7?(srri) “ ??||m < mo) En,e {m < mo) 
rj&'Hd,s d&&a,dire) 

+ sup sup {d/s)Pri,e {di > mo) =: h + 12 - (38) 

V^Hd,s 6&0a-,d{re) 


To complete the proof, we need to show that Ii and I 2 are both negligibly small when e is 
small. 

Consider the term Ii and observe that for all r] G 'Hd,s and 9 G Qa^i^e), 

s~^Er^fi (|??('Sm) - ??| |m < mo) P^,0 (m < mo) 

< (|f/('Sm) - l?(Smo)| ^ < ^-o) + S~^Er,fi (|??(Smo) “ ??||m < mo) P,,,^ (m < mo) 

< S~^VraQ + S~^E^^0\f]{Smo) “ Vl, 

where by (34) and the choice of the sequences and A 

s VfYiQ — Ti (®™o/'^) ^ d — o(l)' 

Next, by definition of the set 'Hd,s of s-sparse d-dimensional vectors r/, we have 

sup sup s~^Ejj^ 0 \rj{smo) - vl < {d/s)Po (ii(smo) > \/21og(d/smo) + dlogd) 

veHd,s e&e,^d{re) ^ ' 

+ sup P 01 (hismo) < \/ 21 og(d/s mo ) + 5logd) (39) 

ei&Baire) ^ ^ 


where by (26) the first summand in the above expression satisfies 

(d/s)Po (^ti(smo) > V 2 log(cf/Smo) + < 5 log d^ 

< {d/s) exp (- {\og{d/smo) + (5/2) log d) (1 + o(l))) 

= O [{Srnjs)d-^/^) = O [d^-^/^) = 0 ( 1 ), 

and the last equality is due to (21) and (34). 
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To treat the second term on the right side of (39), recall that 1 < Sm^/s < d^. Then, 
by the assumption on the parameter = re(s) and the ‘continuity’ of the function Ue{rs) as 
stated in (7), using the fact that Alogd ^ 0 as d ^ oo, one can find a constant di > 0 such 
that for all sufficiently small e 


re > r*(smo)(l + <^i)- 

From this, using Proposition 4.1 in [5] and recalling formula (24), 

inf Ee, (ti(smo)) > inf E^^ (ti(smo)) 

diee^ire) ei&e^{r;{smo){i+Si)) 

- + (fi(smj) > (l + di)2ne(r*(smj) 

0ie0a(rJ(s„Q)) 

= {I + 6lf^/2log{d/smo) > \/21og(d/smo)+ <Jlogd, (40) 

where the last inequality follows from the fact that d^ < Smo < dP, which implies dlogd = 
o(log(d/smo))- Thus as e ^ 0 

2log(d/Smo) + <5iog d - inf E^^ (ti(smo))-oo- (41) 

OiGBaire) 

Now (27) in combination with (40) and (41) gives, uniformly in 9i E ©^(rg), 

P^i [ti{smo) < \J 2log(d/Smo) + log d^ 

< ( tl(Smo) - Eei {tl{Smo)) < \/21og(d/Smo) + <5 log d - inf E^^ (ti(Smo)) ) 

V diee„{re) J 

< Pei (ti(smo) - Eei (ii(smo)) < -\/2log(d/smo) [(1 + <^ 1 )^ “ 1 + o(l)]) 

< exp (^- \og{d/smo) [(1 + dif - 1 + o(l)] ^ (1 + o(l))^ 

= O ((s„io/d)[i^+^ii""^] ) = 0 ( 1 ). 

Putting everything together, we conclude that the hrst term on the right side of (38) satishes 

di = 0 ( 1 ), e ^ 0. (42) 

Let us now show that 


I2 = sup sup (d/s)Pij^e (m > mo) = 0(1). 

r]&Hd,s d&Ba,d(re) 

By definition of m, for all 77 G T-id^s and all 9 G Qa,d{re), 

M 

P»;,0 {m > mo) = ^ Pe,0 {m = k) 

k=mo 

M 

= X] (3 i G {/c, ..., M} : |r)(sfe_i) - fj{si)\ > Vi) 

k=mQ 

M M 

SEE P^,e {\visk-i) - visi)\ > Vi) 

k=mo i=k 

MM / d 

= EE P?7,0 I ^ ^ ldi('®fc— 1 ) rij{si)\ > Vi 

k=mo i=k \i=l 
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Now, we introduce independent events 


Aj{s) = |tj(s) > v^21og(d/s) + Jlogdj , j = 

and denote by Aj(s) the complement of Aj(s). Observing that for all mo < k < i < M the 
quantity \fij{sk-i) — 'nj{si)\ is non-zero only if either Aj{sk-i) H Aj{si) or Aj{sk-i) H Aj{si) 
occurs, we may continue 


M M 

Pr 7,0 {rh > mo) < EE 

k=mo i=k 



n Aj{si) j +1 ( Aj{sk-i) n A 



To bound this sum, we shall apply Bernstein’s inequality saying that if Xi,... ,Xrf are inde¬ 
pendent random variables such that for all j = 1,... ,d and for some H > 0 


E{Xj) = 0 and |E(X™)| < 
then (see, for example, pp. 164-165 of [2]) 
max {P (Srf > t), P (§d < -t)} < 


f exp {-t^/ABj) if 
\ exp(—t/4ff) if 


m = 2 ,3,..., 


0 < t < Bj/H, 
t > Bj/H, 


(43) 


(44) 


where Bj = Observe that for independent random variables 

Xi,..., Xrf with the property 


E(Xj) = 0 and |Xj| < M, j = 1,... ,d, 

for some M > 0, the Bernstein condition (43) holds with H = M/3. Below we will use 
Bernstein’s inequality in the case of t > Bj/H. 

To do this, let us introduce random variables Xj = Xj(sfc_i, Si), 1 < j < d, mo < k < M, 
k < i < M, by the formula 

Xj = I ^ylj(sfc_i) n Aj(si)^-l-1 ^ylj(sfc_i) n Xj(si)^ 

~ Pr),6» (^^^('Sfc-l) n Aj(si)J + ^ri,e O Aj(si)^ , 


and observe that |Xj| < A, j = 1,... ,d, and for all r] G Hd,s and 6 G Qa,d{re) 

Er,, 0 (Xj)=O, j = l,...,d. 


Before applying Bernstein’s inequality, we show that for all tj G Hd,s and 6 G QaAi^e), and for 
all niQ < k < M and k < i < M 


[^P)?,e (^^i('®fc-i) O Aj{si)J + Pr?,0 (^Aj{sk-i) n Aj{si)J 

i=i 


o{vi). 


(45) 


18 



We have 


sup sup (p^^g {Aj{sk-i) n Aj{si)) + Pq,e {Aj{sk-i) n Aj{st 

v&nd,s0&e,^d{re) ^ ^ ^ ^ 


= (d- s) Po yli(sfc_i) n yli(si) + Po ^i(sfc-i) n yli(s 


+s sup 
dieOaire) 


P^l ( ^l('Sfc-l) n Ai(sj) ) +P5I1 ( yli(sA:-l) n ^l(s 


< d 


+s sup 

9l&Qa(re) 


0 (ti{sk-i) > \/21og((i/sfc-i) + 51og(i^ + Po (ti{si) > y^2log{d/si) +(51og(i^ 

Pei (ii(sfc-i) < y^2\og{d/sk-i) + (51og<i^ + P^^ (ti{si) < A/21og(<i/sj) + (51og<i^ 


• '^1 ('5/c— 1 ? “1“ '^2 ('5/c—1; * 52 ). 


(46) 


Recalling (26) and the relation r^d 0 as d —)• 00 , we have 

dPo (ti{si) > y/2log{d/si) + 6 log d^ < dexp (- {log{d/si) + (5/2) log d) (1 + o(l))) 
= O (^Sid~^^'^^ = O (viTdd~^^'^^ = o{vi). 

Similarly, using the fact that < Vi when k < i < M, we obtain 

dPo (ti{sk-i) > ^/2log{d/sk-l) + 51ogd^ = o{vk-i) = o{vi). 

Therefore for all mo < k < M and k < i < M 


Ji{sk-i,Si) = o{vi). (47) 

Consider the second term on the right side of (46), J 2 {sk-i, Si). First, note that for all 
JTTo < k < M and k < i < M, 


s < Si and s < Sk-i, k / mo. 

and for k = mo one has Sk-i = Smo-i < s, which implies s/smo-i < d^. Therefore, by the 
assumption on = r^^s) and the ‘continuity’ of the function Ue{re) as cited in (7), using the 
fact that Alogd —)■ 0 as d ^ 00 , one can find constants ^2 > 0 and ds > 0 such that for all 
sufficiently small e 


r£>r*{si){l + 62 ) and > r/(sfc_i)(l + ds) 
when mo < k < M and k < i < M. From this, for all sufficiently small e, cf. (40), 

inf Eei (ti(si)) > (1 + 52)^\/2 log(d/si) > v^21og(d/sj) + 51ogd, (48) 

diee^(re) 

and hence as e ^ 0 


y^2\og{d/si) +5logd- inf (ti(sj))- 00 . (49) 

^leeahe) 


19 



It now follows from (27), (48), and (49) that, uniformly in Q\ G ©^(re). 


sP^i (^ti{si) < y/ 2 log{d/si) + 5logdJ 

< sPe^ (ti{si) - (ti(si)) < ^/ 2 log{d/si) + 5logd- inf (ti(si)) ) 

V 0iee^(re) y 

— ^^i('Si) — Eg^ (ti(sj)) < —^y 2 log{d/Si) [(1 + 62 )^ — 1 + 0(1)]^ 

< s exp (^- \og{d/ Si) [(1 + 52^-1 + 0(1)] ^ (1 + 0(1))^ 

Also, as relation (49) continues to hold with s^-i, mo < k < M, instead of Si, similar arguments 
yield 

sPg^ (^ti{sk-i) < y/ 2 log{d/sk-i) + 5logd'j = o{vk-i) = o{vi), 

which implies 


J2{Sk-l,Si) = 0{Vi). 


(50) 


Combining (46), (47) and (50), we arrive at (45). We see then by (45) that 


J]E,,g(x2) = jj][p 

T],d ^Aj(sfc_i) n Aj{si)'j + Fr],e ^^^(■Sfc-i) © ^i('5i)^j | (1 + o(l)) 
i=i 


vi=i 


= o{vi). 


Therefore, the use of Bernstein’s inequality as in (44) for the case of t > B^/H with 77 = 4/3 
gives 


I 2 = sup sup (d/s)Pr,,g (m > mo) 

V&Hd.s SeOa,d(rs) 

MM / d 

< sup sup (d/s) EEp.» E Xj > Vi{l + 0(1)) 

0e0^,dhe) i^j^ \j = i 

M M 

<{d/s) ^ ^exp(-(3ui/16)(l+ 0 ( 1 ))) = O (M^(d/s) exp (-(3/16)umo)) 

k=mo i=k 

= O [M‘^{d/s) exp (—(3(i^/16rrf))) = o(l). 

This in combination with (38) and (42) completes the proof of Theorem 3. □ 

5.1 Proof of Theorem 4 

To prove the theorem, we first pick good prior distributions on 77 = {rjj) and 9 = {9j). Having 
done this, we bound the normalized minimax risk by the normalized Bayes risk and show that 
the latter is strictly positive. The first part of the proof up to relation (55) go along the lines 
of that of Theorem 2 in [13], with p = s/d instead of p = 4“^. 
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Let 0* = be the extremal sequence in the problem (the same for all j = 1,., 

1 


.,d) 


E«: 


4 

'j,k 


A / -, f ' inf 


Let the prior distribution of a ‘vector’ 9 = {6i,..., O^) G Qa.dife) be of the form 


-Keidd) = 7re,(d0,)= 

i=l l<|fc|<i^e 


'5.e* +> 




where Sx is the (5-measure that puts a pointmass 1 at x. Denote by 


p = s/d 

the portion of non-zero components of vector rj = (ryi,... ,rjd) G 'Hd,s- The prior distribution 
of T] is naturally defined to be 

d 

k^vidv) = n (dVj) = ((1 - P)do + pdi) {dpj). 

i=i 

Then, assuming that 6 = {9j) and p = (pj) are independent, we get 

Re := inf sup sup s~^E.r,^g\p-p\>s~^mfEer Eng'E‘n,e\d~d\ 

^ (?ee,,d(re) ^ 

d d 

= s inf ^ ^ \Vj ~ 0j\ — s inf^^E.^^^E.^g^E^^g^\pj — pj\, 

^ i=i ^ i=i 

where the infimum is over all selectors p = (pj) and E^j^g^ is the expected value that corresponds 
to the measure ErijOj induced by the observation Xj = {Xj^k)i<\k\<K^ consisting of independent 
random variables Xj^k that follow normal distributions N{pj9j^k,£^). 

Consider the mixture of distributions given by the formula 


'^iT,r)j{dXj) — Eejg^Ex]j0j{dXji^) — 

l<\k\<Ke 


N{-rijei^,e^) + N{p,ei^,E^) 


{dXj^k)- (51) 


In particular, when pj = 0, PTrfi{dXj) = ](([ N{0,e‘^){dXj^i~). Using the notation 

l<|fc|<ii'e 

n* 

* 

we obtain with respect to the probability measure P-K,rij 

Yj,k := ^ = pjvl^ + -• iV(r?,u*fc, 1), 1 < j < d, 1 < |fe| < iL,. 


Next, denoting Yj = (h)-,fc)i<|fc|<ii:g, we may rewrite the likelihood ratio in the form 

2 I cosh {pjvlkYj,k) • 


d'Pn,gj (y^.) = Yl ^ djjvlk f 


dP 


exp 


7r,0 


l<|fc|<i^e 


(52) 


(53) 
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From this, using the fact that each rjj takes on only two values, zero and one, with respective 
probabilities (1 — p) and p, we may continue 


d 

Re — ~ Vj\ 


d 

Vinf [(1 -p)E^^o(Fi) +pE^,i(l “ Fj)] , (54) 


where infjj^. (1 — p)E 7 r,o(Fj) i(l — f/j) is the Bayes risk in the problem of testing two simple 

hypotheses 

Ho'-P = PttP vs. Hi:P = P^^i, 

with the probability measures P^p and Pjrp defined according to (51). In particular, under 
the null hypothesis, the vector Yj = (F)',fc)i<|fc|<Xj has a normal distribution with density 

function Pn,o{t) = 0 (27r)“^/^ exp(—1^/2), t = itk)i<\k\<Ke- Py (53) the likelihood ratio 

^<\k\<Ke 

in this problem becomes 


= A^{Yj) 




and the optimal (Bayes) test pB that minimizes the Bayes risk in hand has the form (see, for 
example, [4, Sec. 8.11]) 

PBiY ,)= l (^ A ^{ Y,)>^y 

Using this, we infer from (54) that 

Rs = inf sup sup s~^'Er],e\p — vl 
^ V&'Hd,s 9&Ba,d{re) 

> id/s)P^p y^{Yi) > + P^,1 (^A^(yi) < =: A, + B,. (55) 

where, under the Ptt, 771 -probability with 771 G { 0 , 1 }, the vector Yi = iYi^k)i<\k\<Ke inde¬ 
pendent normal components 


hi,fc = rjivl^k + ~ 1), 1 < |A:| < Ke- 

It now follows from (55) that the minimax risk Rg, is positive if at least one of the terms, 
or i?e, is positive. Let us prove that for all sufficiently small e the probability is separated 
from zero. 

Recall that d = —)■ 00 and s = Sd = o{d) as e ^ 0. Put 

H = He = log ~ log(d/s), 

and introduce the random variable 


A^ = A^(yi) :=logA^(yi). 


Using the notation Pq for Ptt^) consider the probability measure Ph, depending on a positive 
parameter h = he, that is defined by the formula 


dPo^'^'“ 4/(/r) 


^(/r) = Ep„exp(/rA^(yi)). 
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With the parameter /i > 0 chosen to satisfy 


= H, 

we have (see Lemma 2 in [13]) 

and (see formula (45) in [13]) 

4'(/i) = exp + o(l))^ , 

where for notational simplicity we use for u^ir^). 

We have 

Be = E ^,1 (I (A^(yi) < H)) = E^,o (exp(A^(yi))I (A^(yi)) < H)) 

= E;, (^^(yi)exp(A^(yi))I(A^(yi)) < H)^ 

= ^{h)^h (exp[(l - h)A^(yi)]I (A^(yi)) < H )). 

By Lemma 3 in [13], the standardized random variable 

ry Att /i/i 

'■= -, 

where 


( 56 ) 


(57) 


(58) 


^ih = EpJA^) = ul{h - 1/2)(1 + o(l)), al = VarpJA^) = ul{l + o(l)), 

converges in P/j-distribution to an A^(0,1). Therefore the statistic A 7 r(yi) on the right side of 
(58) is nearly a normal N{H,u^) random variable. 

Next, by assumption and the ‘continuity’ of as stated in (7), for some constant (5i > 0 


Ue/^J\og{d/s) < V2{1 - (5i), 

provided e is small enough. This and formula (56) give the inequality 1 — h < 0, which implies 
for all y G and all sufficiently small e 

exp[(l — /i)A^(y)]I (A^(y)) < H) < exp[(l — h)H] ~ (d/s)^“^ < const. 

Then, by the dominant convergence theorem, the replacement of A,r(yi) by an N{H,ul) on 
the right side (58) and the use of (56) and (57) yield for all sufficiently small e 


Be 


exp 


h? — h 


-u. 




= exp 


h? — h 2 rr/1 (1 “ 

-^ul + H{1 -h) + ^^ 


cH 


1 


■ exp 


(x-Hf 

2vi 


dx 


exp 


{x-{H +{l- h)ul)) 


2 \t 2 ' 


fH 


~ exp(O) J 

rH+{l—h)u1 


1 


> 


—oo \f^Ui 

1 


exp 


/-oo V^Ue \ 

{x-{H + {l- h)u^^)f 


2ul 


dx 


2vi 


dx 


2 ^ 2 ' 




TTiif 


exp 


(x - (g + (1 - h)ul)) 

2ul 


dx = 1/2. 
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From this 


liminf > liminf > 1/2 > 0, 
e^O e^O ^ / 

and the proof of Theorem 4 is complete. □ 


6 Concluding remarks 

In the context of variable selection in high dimensions, in both regression and white noise 
settings, simple thresholding provides plausible alternative to the lasso for a large range of 
problems. As a statistical tool, thresholding strategy is simple in nature and is not as compu¬ 
tationally demanding as the lasso, especially in very high dimensional problems. At the same 
time, it is capable of doing at least as good as the lasso, or even better (see our Theorems 1 to 
6, Theorems 9 to 11 in [6], and Theorems 1 and 2 in [13] for details). In light of these facts, 
we support the viewpoint of Genovese et al. [6] that for sparse high-dimensional regression 
problems a simple thresholding procedure merits further investigation. 

To conclude our study, we point out possible directions for extending the results obtained 
in this paper. For the two function spaces Jv at hand, it might be of interest to produce 
asymptotically exact and almost full selectors in very high dimensional settings when the 
conditions logd = o (e“2/(2(7-i-i)^ logd = o(loge“^) on the growth of d as a function of e 
are violated. 

The setup of inverse problems, where the observations are = Kf + sW, with K being 
a linear operator such that K*K is compact, translates into a Gaussian sequence model with 
heterogenous observations Xj^k = 'nj^j,k + GVkCj,k, where are the eigenvalues of K*K. This 
case, which extends our setup, can be treated by using the sharp testing results for the inverse 
problems obtained in [14]. 

Furthermore, handling the problem of variable selection in a sequence space model, general 
ellipsoids {9 G ■ Ylkez^^k — ™ with semi-axes Ck decreasing fast enough, could 

be studied. A more complicated model, in which a d-variate regression function / admits a 
decomposition to a sum of /c-variate components, with k > 2 and only a small number s of 
these components being non-zero, also deserves some attention. 

Eliminating the assumption of known parameter a leads to the problem of adapting the 
proposed selection procedures to the possible values of a. 

To pursue more practical goals, one can try to translate the results obtained for an additive 
s-sparse Gaussian white noise model to the corresponding discrete regression model for which 
the corresponding detection problem was solved in [1]. 
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